Cross Table Cubing: Mining Iceberg Cubes from Data Warehouses
نویسندگان
چکیده
All of the existing (iceberg) cube computation algorithms assume that the data is stored in a single base table, however, in practice, a data warehouse is often organized in a schema of multiple tables, such as star schema and snowflake schema. In terms of both computation time and space, materializing a universal base table by joining multiple tables is often very expensive or even unaffordable in real data warehouses. In this paper, we investigate the problem of computing iceberg cubes from data warehouses. Surprisingly, our study shows that computing iceberg cube from multiple tables directly can be even more efficient in both space and runtime than computing from a materialized universal base table. We develop an efficient algorithm, CTC (for Cross Table Cubing) to tackle the problem. An extensive performance study on synthetic data sets demonstrates that our new approach is efficient and scalable for large data warehouses.
منابع مشابه
Computing Complex Iceberg Cubes by Multiway Aggregation and Bounding
Iceberg cubing is a valuable technique in data warehouses. The efficiency of iceberg cube computation comes from efficient aggregation and effective pruning for constraints. In advanced applications, iceberg constraints are often non-monotone and complex, for example, “Average cost in the range [δ1, δ2] and standard deviation of cost less than β”. The current cubing algorithms either are effici...
متن کاملMDAG-Cubing: A Reduced Star-Cubing Approach
In this paper, we extend the Star-Cubing approach by introducing a new hybrid dimension-based approach to efficiently compute full or iceberg cubes with simple or complex measures. This new approach, named Multidimensional Direct Acyclic Graph Cubing (MDAG-Cubing), introduces the notion of external and internal nodes to reduce the cube representation without loss of generality. The reduced repr...
متن کاملHigh-dimensional Hierarchical Olap : a Prefix– Index Hierarchical Cubing Approach
The pre-computation of data cubes is critical for improving the response time of OLAP(online analytical processing) systems and accelerating data mining tasks in large data warehouses. However, as the sizes of data warehouses grow, the time it takes to perform this pre-computation becomes a significant performance bottleneck. In a high dimensional OLAP, it might not be practical to build all th...
متن کاملOLAP Formulations for Supporting Complex Spatial Objects in Data Warehouses
In recent years, there has been a large increase in the amount of spatial data obtained from remote sensing, GPS receivers, communication terminals and other domains. Data warehouses help in modeling and mining large amounts of data from heterogeneous sources over an extended period of time. However incorporating spatial data into data warehouses leads to several challenges in data modeling, ma...
متن کاملStar-Cubing: Computing Iceberg Cubes by Top-Down and Bottom-Up Integration
Data cube computation is one of the most essential but expensive operations in data warehousing. Previous studies have developed two major approaches, top-down vs. bottomup. The former, represented by the MultiWay Array Cube (called MultiWay) algorithm [25], aggregates simultaneously on multiple dimensions; however, it cannot take advantage of Apriori pruning [2] when computing iceberg cubes (c...
متن کامل